So far we have used linear regressions, where we modeled the outcome as follows:
\[ y \sim Normal(\mu,sigma) \\ \mu = \alpha + \beta X \]
This regression works generally well, even if \(\small y\) is not normally distributed. (The residuals should be though. And a posterior predictive plot is always useful!)
However, for some outcome types a linear regression simply does not work. This is particularity clear when the outcome is bound to be equal to or larger than zero (like counts that “clump” at zero) or when the outcome is binary.
As an example the next figure shows end of year math grades in a Portuguese school.1.
df=read.table("data/student-mat.csv",sep=";",header=TRUE)
df = df[df$Medu>0,]
set_par()
x = barplot(c(table(df$G3),0,0,0), xaxt = "n", border = "grey")
axis(1,at = x[seq(0,20,2)+1], labels = seq(0,20,2))
In these situations we can use this kind of model:
\[ y \sim dist(\theta_1,\theta_2) \\ \]
Here \(\small dist(\theta_1,\theta_2)\) is a distribution with two parameters2 that is consistent with the observed data.
Choosing a different distribution means choosing a different
likelihood function. That is, we exchange dnomr
with an alternative distribution that is appropriate for the data.
Here are a few examples:
set_par()
x = seq(0,5,1)
plot(x-.2,dbinom(x, 5, .2),"h", xlim = c(0,20), lwd = 2,
ylab = "probability mass", xlab = "number of success")
x = seq(0,10,1)
lines(x,dbinom(x, 10, .5),"h", col = "blue", lwd = 2)
x = seq(0,20,1)
lines(x+.2,dbinom(x, 20, .75),"h", col = "red", lwd = 2)
legend("topright",
lwd = 2, col = c("black","blue","red"),
legend = c(
expression(theta[1]~" = n = 5, "~theta[2]~" = p = .2"),
expression(theta[1]~" = n = 10, "~theta[2]~" = p = .5"),
expression(theta[1]~" = n = 30, "~theta[2]~" = p = .75")
),
bty = "n")
set_par()
x = seq(0,5,1)
plot(x-.2,dpois(x, .5),"h", xlim = c(0,20), lwd = 2,
ylab = "probability mass", xlab = "number of events")
x = seq(0,10,1)
lines(x,dpois(x, 5),"h", col = "blue", lwd = 2)
x = seq(0,20,1)
lines(x+.2,dpois(x, 10),"h", col = "red", lwd = 2)
legend("topright",
lwd = 2, col = c("black","blue","red"),
legend = c(
expression(theta[1]~" = "~lambda~" = p = .5"),
expression(theta[1]~" = "~lambda~" = p = 5"),
expression(theta[1]~" = "~lambda~" = p = 10")
),
bty = "n")
As can be seen from the previous plots, the expected value (the mean parameter) of the Binomial and Poisson distributions are constrained:
But if we would just model e.g. \[ p = \alpha + \beta X \] we could get values between minus and plus infinity. Therefore the generalized linear model also used link functions that map the result of the linear predictor \(\small \alpha + \beta X\) to the desired range:
\[ p = link(\alpha + \beta X) \]
GLMs use different link functions for different distributions to implement different constraints:
\[ \small inv.logit(x) = \frac{1}{1+exp(-x)} = \frac{exp(x)}{1+exp(x)} \]
set_par()
curve(boot::inv.logit(x),-5.1,5.1,n = 100,
xlab = expression(alpha~" + "~beta~"X"), ylab = "p")
set_par()
curve(exp(x),-3,4,n = 100,
xlab = expression(alpha~" + "~beta~"X"), ylab = expression(lambda))
One important effect of link functions is that we cannot interpret regression weights in the same way as for simple linear regression. In a nutshell, if link functions exponentiate the linear predictor, regression weights represent multiplicative rather than additive effects.
In R, family is the term for an object that
tells a regression model both what outcome distribution to use and what
link function to use.
When do we use the Binomial and when the Poisson Likelihood?
For a hands on example for logistic regression we will try to predict failing the math class with the Portugese school data shown above. In particular, we will use these predictors:
It is always a good idea to plot the data first:
data.list = list(
Medu = df$Medu,
failures = df$failures,
fail = df$G3 == 0
)
set_par(mfrow = c(1,2))
table(data.list$failures) %>% barplot(border = "grey", xlab = "number fails")
table(data.list$Medu) %>% barplot(border = "grey", ylab = "count", xlab = "maternal edu")
We are centering Medu, so that the intercept measures
odds of “middle high” maternal education.
data.list$cMedu = data.list$Medu-2.5
We begin the analysis with an intercept model only. Especially if one is not familiar with a model, it is good to start simple and add complexity step by step.
model = alist(
fail ~ dbinom(1,p),
logit(p) <- a,
a ~ dnorm(0,10)
)
And we estimate the model with ulam (i.e. Stan):
u.fit_I = ulam(
model,
data = data.list,
iter = 2000, # 2000 iterations, (1000 warmup)
chains = 4, # four chains, to check convergence
cores = 4, # use 4 cores in parallel
cmdstan = TRUE) # use cmdstanr not rstan
We extract the prior and plot the prior prediction for the failure rate:
prior = extract.prior(u.fit_I)
set_par()
hist(inv_logit(prior$a), main="", breaks = 30)
To see what is going on here, lets overlay the logistic function on the prior:
set_par(mfrow = c(1,2), mar=c(3,3,3,3), cex = 1)
curve(dnorm(x,0,10),-30,30, ylab = "prior density", xlab = "a")
axis(4,at = seq(0,0.04, length.out = 5),
labels = seq(0,1,length.out = 5), col = "red")
curve(inv_logit(x)/25, add = T, col = "red")
abline(h = .1/25, lty = 3, col = "grey")
title("a ~ dnorm(0,10)")
curve(dnorm(x,0,2),-6,6, ylab = "prior density", xlab = "a")
axis(4,at = seq(0,0.2, length.out = 5),
labels = seq(0,1,length.out = 5), col = "red")
curve(inv_logit(x)/5, add = T, col = "red")
abline(h = .1/5, lty = 3, col = "grey")
title("a ~ dnorm(0,2)")
With a wide prior, most of the probability mass is either smaller than -5, leading to p very close to 0, or larger than 5, leading to p close to 1. Hence we will use a tighter prior on a.
How about the regression coefficients \(\small beta\)? Regression coefficients are log odds ratios.
For instance, lets assume the following:
then the odds ratio is :
\[ \begin{align*} OR = & \frac{\textrm{failure odds with no prior fail}}{\textrm{failure odds with prior fail}} \\ = & \frac{\frac{2}{98}}{\frac{4}{96}} = \frac{.0204}{.0417} = .49 \end{align*} \]
Note that this specific odds ratio comes close to the risk ratio of \(\small .2 / .4 = .5\). This is, however, not generally the case. If the probabilities are far away from zero, risk ratio and odds ratio do not align! We can see this if we just multiply the probability of failure by 15 in both groups: \[ \begin{align*} OR = \frac{\frac{30}{70}}{\frac{60}{40}} = \frac{.428}{.667} = .29 \end{align*} \]
Lets get back to specifying our prior for \(\small \beta\):
dnorm(0,.7) prior.dnorm(0,2) prior.Lets specify such a model and look at the prior predictions for the difference between two levels of education.
model = alist(
fail ~ dbinom(1,p),
logit(p) <- a + b1*failures,
a ~ dnorm(0,2),
b1 ~ dnorm(0,2)
)
u.fit_F = ulam(
model,
data = data.list,
iter = 2000, # 2000 iterations, (1000 warmup)
chains = 4, # four chains, to check convergence
cores = 4, # use 4 cores in parallel
cmdstan = TRUE) # use cmdstanr not rstan
prior = extract.prior(u.fit_F)
Now we can use the link_ulam function and new data to
generate predictions from the prior.
First we look at the effect of a one level change of education.
p = link_ulam(
u.fit_F, post = prior,
data = list(failures = c(0,1)))
set_par(mfrow = c(1,2), mar=c(3,3,3,3), cex = 1)
hist(p[,2]-p[,1],
main = "risk difference", xlab = "P(fail|high) - P(fail|low)")
hist(p[,2]/p[,1], breaks = 30,
main = "risk ratio", xlab = "P(fail|high) / P(fail|low)")
These differences and ratios are pretty large, so it is safe to make the prior a bit narrower and set the standard deviation for the regression weights to 1.
Here is the our model so far, which include previous class fails and maternal education as predictors:
model = alist(
fail ~ dbinom(1,p),
logit(p) <- a + b1*failures + b2*cMedu,
a ~ dnorm(0,2),
b1 ~ dnorm(0,1),
b2 ~ dnorm(0,1)
)
u.fit_FE = ulam(
model,
data = data.list,
log_lik = TRUE, # for model comparison
iter = 2000, # 2000 iterations, (1000 warmup)
chains = 4, # four chains, to check convergence
cores = 4, # use 4 cores in parallel
cmdstan = TRUE) # use cmdstanr not rstan
Some quick convergence diagnostics:
precis(u.fit_FE, depth = 2)
## mean sd 5.5% 94.5% n_eff Rhat4
## a -2.5825926 0.2208076 -2.9475630 -2.24703945 2012.444 1.001705
## b1 0.6923649 0.1726043 0.4124413 0.96734347 2240.652 1.002301
## b2 -0.3014267 0.1659922 -0.5661254 -0.03307494 3014.866 1.000280
Which predictions one generates from the prior depends on research questions and domain knowlegde. My approach is to check if prior-predicted data are broadly consistent with the overall distribution of the data and to check if effect estimnates fall into a plausible range. The latter is easier for gaussian models and harder for models with exponential link functions. One simple rule of thumb for logistic or binomial regressions is that with a \(\small Normal(0,1)\) prior on regression coefficients the the 97.5 percentile is 1.96. If we exponentiate this we get 7, wich means that a \(\small Normal(0,1)\) prior expresses the pior information that the OR is probably not much larger then 7. Note that this prior would still be overwhelmed if we have a reasonable amount (N>25)3 of data.
To see if our model describes the data well, we plot model predicted and observed data together:
aggr.fun = function(x) return(c(mean = mean(x), N = length(x)))
obs =
aggregate(
data.list$fail,
by = with(data.list,
data.frame(cMedu = cMedu, Medu = Medu, failures = failures)),
aggr.fun
)
obs = cbind(obs,obs$x)
sim.data =
unique(as.data.frame(data.list)[,c(2,4)])
p = link_ulam(
u.fit_FE,
data = sim.data)
pp.stats = cbind(
sim.data,
m = colMeans(p),
t(apply(p,2,PI))
)
set_par(mfrow = c(2,2), mar=c(3,3,1.5,0.5), cex = .75)
tmp =
lapply(unique(obs$cMedu), function(x) {
plot(obs[obs$cMedu == x,"failures"],
obs[obs$cMedu == x,"mean"],
cex = sqrt(obs[obs$cMedu == x,"N"]),
xlim = c(-.5,3.5), ylim = c(0,.5),
ylab = "proportion fail",
xlab = "# past failures",
xaxt = "n", pch = 16, col = "grey")
title(paste("maternal edu",x+2.5),line = 0.5, cex.main = 1)
axis(1,at = 0:3)
points(pp.stats[pp.stats$cMedu == x,"failures"],
pp.stats[pp.stats$cMedu == x,"m"], pch = 16, col = "blue")
arrows(pp.stats[pp.stats$cMedu == x,"failures"],
y0 = pp.stats[pp.stats$cMedu == x,"5%"],
y1 = pp.stats[pp.stats$cMedu == x,"94%"],
length = 0, col = "blue")
})
This does not look great. We can try to improve the model in two ways:
We are using the indexing approach discusses in the book for this.
model = alist(
fail ~ dbinom(1,p),
logit(p) <- a[Medu] + b[Medu]*failures,
a[Medu] ~ dnorm(0,2),
b[Medu] ~ dnorm(0,1)
)
u.fit_FE.2 = ulam(
model,
data = data.list,
log_lik = TRUE, # for model comparison
iter = 2000, # 2000 iterations, (1000 warmup)
chains = 4, # four chains, to check convergence
cores = 4, # use 4 cores in parallel
cmdstan = TRUE) # use cmdstanr not rstan
Did the chains converge?
precis(u.fit_FE.2, depth = 2) %>% round(2)
## mean sd 5.5% 94.5% n_eff Rhat4
## a[1] -2.17 0.47 -2.94 -1.44 2831.32 1
## a[2] -2.10 0.33 -2.66 -1.60 3402.34 1
## a[3] -2.85 0.45 -3.60 -2.18 2801.48 1
## a[4] -3.01 0.41 -3.70 -2.40 4075.78 1
## b[1] 0.53 0.31 0.04 1.00 2737.27 1
## b[2] 0.59 0.32 0.08 1.10 3472.30 1
## b[3] 0.82 0.31 0.32 1.31 3019.55 1
## b[4] 0.01 0.74 -1.26 1.11 3258.11 1
This looks OK, so we do again a posterior predictive check:
sim.data =
unique(as.data.frame(data.list)[,c(1,2)])
p = link_ulam(
u.fit_FE.2,
data = sim.data)
pp.stats = cbind(
sim.data,
m = colMeans(p),
t(apply(p,2,PI))
)
set_par(mfrow = c(2,2), mar=c(3,3,1.5,0.5), cex = .75)
tmp =
lapply(unique(obs$Medu), function(x) {
plot(obs[obs$Medu == x,"failures"],
obs[obs$Medu == x,"mean"],
cex = sqrt(obs[obs$Medu == x,"N"]),
xlim = c(-.5,3.5), ylim = c(0,.5),
ylab = "proportion fail",
xlab = "# past failures",
xaxt = "n", pch = 16, col = "grey")
title(paste("maternal edu",x),line = 0.5, cex.main = 1)
axis(1,at = 0:3)
points(pp.stats[pp.stats$Medu == x,"failures"],
pp.stats[pp.stats$Medu == x,"m"], pch = 16, col = "blue")
arrows(pp.stats[pp.stats$Medu == x,"failures"],
y0 = pp.stats[pp.stats$Medu == x,"5%"],
y1 = pp.stats[pp.stats$Medu == x,"94%"],
length = 0, col = "blue")
})
This does not look that much better. Let’s check this with PSIS-LOO:
compare(u.fit_FE.2,u.fit_FE, func = "PSIS")
## PSIS SE dPSIS dSE pPSIS weight
## u.fit_FE 232.7150 25.44361 0.000000 NA 2.904342 0.97496306
## u.fit_FE.2 240.0391 25.55773 7.324094 3.263811 7.150832 0.02503694
In this case the added complexity has not helped much. So lets look at the parameters of the first model:
plot(precis(u.fit_FE, depth = 2))
precis(u.fit_FE, depth = 2) %>% round(2)
## mean sd 5.5% 94.5% n_eff Rhat4
## a -2.58 0.22 -2.95 -2.25 2012.44 1
## b1 0.69 0.17 0.41 0.97 2240.65 1
## b2 -0.30 0.17 -0.57 -0.03 3014.87 1
What does this all mean?
Because the parameters are log-odds ratios, we exponentiate them to get ORs. Because the probability of our outcome is (relatively) close to zero, we can interprete the OR as approximate risk ratios. Hence
The OR and relative risks that we have to deal with due to the link function make it hard to interpret the results. But if we create posterior predictions we can simply calculate contrasts.
Lets look at the data we used to generate posterior predictions:
sim.data =
unique(as.data.frame(u.fit_FE@data)[, c("cMedu","failures","Medu")])
sim.data = sim.data[order(sim.data$cMedu,sim.data$failures),]
rownames(sim.data) = NULL
sim.data
## cMedu failures Medu
## 1 -1.5 0 1
## 2 -1.5 1 1
## 3 -1.5 2 1
## 4 -1.5 3 1
## 5 -0.5 0 2
## 6 -0.5 1 2
## 7 -0.5 2 2
## 8 -0.5 3 2
## 9 0.5 0 3
## 10 0.5 1 3
## 11 0.5 2 3
## 12 0.5 3 3
## 13 1.5 0 4
## 14 1.5 1 4
## 15 1.5 2 4
If we now generate the posterior predictions, we get a matrix in
which each column corresponds to one row in sim.data
pp = link_ulam(
u.fit_FE,
data = sim.data)
dim(pp)
## [1] 4000 15
A simple question would be what the effect of having one vs zero
previous failures is when maternal educational level is 1. We simply
look up in the sim.data table that we need to compare
columns one and two in the posterior predictions for this. In the same
manner we can do this for maternal educational level 2.
# effect of prior failure for Medu = 1
delta.L1 = pp[,2] - pp[, 1]
# effect of prior failure for Medu = 2
delta.L2 = pp[,6] - pp[, 5]
# plot data
xlim = range(c(delta.L1,delta.L2))
breaks = seq(xlim[1]-.01,xlim[2]+.01, length.out = 25)
set_par(mfrow = c(1,3), cex = 1.5)
hist(delta.L1, main = "", xlim = xlim, breaks = breaks)
abline(v = 0, col = "red", lty = 3)
hist(delta.L2, main = "", xlim = xlim, breaks = breaks)
abline(v = 0, col = "red", lty = 3)
hist(delta.L2-delta.L1, main = "")
abline(v = 0, col = "red", lty = 3)
The bottom histogram shows an interaction effect on the scale of risk differences, even though our model has no interaction term! This is due to the exponent in the link function!
This interaction effect can also be understood by remembering that odds ratios are relative effects, i.e. the effect of a one unit change of a variable depends on the probability of success at the comparison value. Here is an example with numbers. Lets assume our intercepts for two groups are 1 and 2, respecitively, and the log-odds ration for the effect of interest is 1. Then we can calculate the risk differences as
inv_logit(1+1) - inv_logit(1+0)= 0.15 andinv_logit(2+1) - inv_logit(2+0)= 0.07, and see that the absolute effect of the logg odds ratio depends on the starting point. Calculateinv_logit(1+1) / inv_logit(1+0)andinv_logit(2+1) / inv_logit(2+0)and compare the results. Therefore, the meaningfulness of a result in log odds ratio can only be understood based on some background knnowledge. Note that the dependence of the absolute effect on the comparison value also explains the “interaction with oneself” described in the chapter.
Here is another example: If we want to see the difference in fail probability between maternal education levels 2 and 3, we calculate one vector of samples with fail probabilities at level 1, one for level 4, and subtract them:
fail.level1 = rowMeans(pp[,1:4])
fail.level4 = rowMeans(pp[,13:15])
delta = fail.level4 - fail.level1
set_par( mar=c(3,3,.5,.5), cex = 1.5)
hist(delta, breaks = 25, main = "")
However, this analysis assumed that the pupils with different number of previously failed classes are equally frequent, which is not the case. So we need to weight with these probabilities:
M1_p.failures = prop.table(table(df$failures[df$Medu == 1]))
M4_p.failures = prop.table(table(df$failures[df$Medu == 4]))
fail.level1 = pp[,1:4] %*% M1_p.failures
fail.level4 = pp[,13:15] %*% M4_p.failures
delta = fail.level4 - fail.level1
set_par( mar=c(3,3,.5,.5), cex = 1.5)
hist(delta, breaks = 25, main = "")
This are just two examples. Every calculation we can make based on the posterior prediction is legal!
The binomial regression is the general case of which the logistic regression is a special case. Both have a logistic link function, but whereas logistic regressions estimates if one trial was a “success”, binomial regressions estimate how many out of N trials were successes.
To look at the aggregated binomial regression we will look at data from Norwegian tests and ask the questions “Does the number of excused children depend on the topic English, reading or Math?”
Here is the data:
load(file = "data/NasjonalTests5.Rdata")
NatTests5
## Fylke topic N_total N_part N_Miss N_Exc Fylke_Nr topic_Nr
## 1 Viken Eng 15426 14399 209 818 1 1
## 2 Viken Math 15426 14463 207 756 1 2
## 3 Viken Read 15426 14392 155 879 1 3
## 4 Vestland Eng 7684 7128 103 453 2 1
## 5 Vestland Math 7684 7212 88 384 2 2
## 6 Vestland Read 7684 7115 108 461 2 3
## 7 Oslo Eng 6978 6455 153 370 3 1
## 8 Oslo Math 6978 6399 153 426 3 2
## 9 Oslo Read 6978 6406 125 447 3 3
## 10 Rogaland Eng 6451 6038 78 335 4 1
## 11 Rogaland Math 6451 6048 74 329 4 2
## 12 Rogaland Read 6451 5971 80 400 4 3
## 13 Trøndelag Eng 5464 5000 109 355 5 1
## 14 Trøndelag Math 5464 5030 117 317 5 2
## 15 Trøndelag Read 5464 5004 110 350 5 3
## 16 Vestfold og Telemark Eng 4863 4484 58 321 6 1
## 17 Vestfold og Telemark Math 4863 4543 53 267 6 2
## 18 Vestfold og Telemark Read 4863 4483 54 326 6 3
## 19 Agder Eng 3911 3621 63 227 7 1
## 20 Agder Math 3911 3650 62 199 7 2
## 21 Agder Read 3911 3618 54 239 7 3
## 22 Innlandet Eng 3804 3497 75 232 8 1
## 23 Innlandet Math 3804 3526 65 213 8 2
## 24 Innlandet Read 3804 3502 81 221 8 3
## 25 Møre og Romsdal Eng 3185 2908 25 252 9 1
## 26 Møre og Romsdal Math 3185 2934 28 223 9 2
## 27 Møre og Romsdal Read 3185 2865 33 287 9 3
## 28 Nordland Eng 2645 2394 74 177 10 1
## 29 Nordland Math 2645 2420 74 151 10 2
## 30 Nordland Read 2645 2375 85 185 10 3
## 31 Troms og Finnmark Eng 2607 2364 58 185 11 1
## 32 Troms og Finnmark Math 2607 2380 60 167 11 2
## 33 Troms og Finnmark Read 2607 2297 112 198 11 3
To set up a binomial model, we set up a model that
Here is the model:
NT.model =
alist(
N_Exc ~ dbinom(N_total, p),
logit(p) <- a[Fylke_Nr] + b[topic_Nr],
a[Fylke_Nr] ~ dnorm(0,1),
b[topic_Nr] ~ dnorm(0,1))
Here we are fitting the model.
u.fit_NT = ulam(
NT.model,
data = NatTests5,
log_lik = TRUE, # for model comparison
iter = 2000, # 2000 iterations, (1000 warmup)
chains = 4, # four chains, to check convergence
cores = 4, # use 4 cores in parallel
cmdstan = TRUE) # use cmdstanr not rstan
precis(u.fit_NT, depth = 2) %>% round(2)
n_eff and Rhat don’t look great. We look at traceplots for the first a and b to figure out what is going on:
library(posterior)
library(bayesplot)
draws = u.fit_NT@stanfit %>% as_draws()
draws %>% subset_draws(c("a[1]","b[1]")) %>% mcmc_trace()
The chains are not mixed as well as one would hope, and it looks a bit like the chains are correlated. We examine this with a pairs plot:
draws %>% subset_draws(c("a[1]","b[1]")) %>% mcmc_pairs()
The strong negative correlation reminds us that the Fylke intercept actually already represents the proportion of excused pupils in the first (any one of the) topics. WE have to re-formulate the model and use only two of the topics.
We update our data so that there is one dummy variable for Math and one for English tests.
NatTests5$Math = 1*(NatTests5$topic == "Math")
NatTests5$Eng = 1*(NatTests5$topic == "Eng")
And we formulate and estimate an updated model:
NT.model2 =
alist(
N_Exc ~ dbinom(N_total, p),
logit(p) <- a[Fylke_Nr] + b1*Math + b2*Eng,
a[Fylke_Nr] ~ dnorm(0,1),
b1 ~ dnorm(0,1),
b2 ~ dnorm(0,1))
u.fit_NT2 = ulam(
NT.model2,
data = NatTests5,
log_lik = TRUE, # for model comparison
iter = 2000, # 2000 iterations, (1000 warmup)
chains = 4, # four chains, to check convergence
cores = 4, # use 4 cores in parallel
cmdstan = TRUE) # use cmdstanr not rstan
precis(u.fit_NT2, depth = 2) %>% round(2)
This looks much better.
Here is a posterior predictive plot.
NatTests5$obs = NatTests5$N_Exc/NatTests5$N_total
NatTests5$x = NatTests5$Fylke_Nr + (NatTests5$topic_Nr-2)/8
set_par(mar=c(3,1,.5,.5))
with(NatTests5,
plot(obs, rev(x), cex = 2, col = topic_Nr,
pch = 16, xlim = c(0,.1),
yaxt = "n", ylab = "", xlab = "proportion excused")
)
with(NatTests5[NatTests5$topic == "Math",],
text(0, rev(x), Fylke, pos = 4)
)
legend("topright", col = 1:3,
legend = c("English","Math","Reading"), bty = "n", pch = 16)
pp = link(u.fit_NT2)
CI = apply(pp,2,PI)
arrows(y0 = rev(NatTests5$x), x0 = CI[1,], x1 = CI[2,], col = "blue", length = 0)
points(colMeans(pp), rev(NatTests5$x), cex = 1.25,
col = "blue", pch = 21, bg = "white")
What do the coefficients tell us?
exp(-2.36) = 0.09, and -2.8 for Viken, i.e. the probability
to be excused from the reading test is exp(-2.8) = 0.06.
This means that the chance to be excused from the reading test is
`exp(-2.36)/exp(-2.8) = 1.55 larger in Møre og Romsdal, compared to
Viken.Here is a plot with the log odds to be excused from the reading test by Fylke:
plot(precis(u.fit_NT2, depth = 2, pars = "a"))
yy = unique(NatTests5[,c("Fylke","Fylke_Nr")])
text(-2.4,rev(yy$Fylke_Nr), yy$Fylke,pos = 3)
Regarding the effect of topic we find that
plot(precis(u.fit_NT2, pars = c("b1","b2")))
exp(-.17) = 0.84. Because the baseline probability for
reading is low, we can simply say that the relative risk to be excused
from the math test is 0.84exp(-.08) = 0.92. Because the baseline probability for
reading is low, we can simply say that the relative risk to be excused
from the math test is 0.92P. Cortez and A. Silva. Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008) pp. 5-12, Porto, Portugal, April, 2008, EUROSIS, ISBN 978-9077381-39-7. The data can be downloaded here↩︎
There are also distributions with 1 or 3 or more parameters↩︎
I plugged this number from thin air, should really to a quick calculation on this.↩︎